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Abstract — Analyzing a distributed computation is a hard prob- 
lem in general due to the combinatorial explosion in the size of 
the state-space with the number of processes in the system. By 
abstracting the computation, unnecessary state explorations can 
be avoided. Computation slicing is an approach for abstracting 
distributed computations with respect to a given predicate. We 
focus on regular predicates, a family of predicates that covers 
many commonly used predicates for runtime verification. The ex- 
isting algorithms for computation slicing are centralized - a single 
process is responsible for computing the slice in either offline or 
online manner. In this paper, we present first distributed online 
algorithm for computing the slice of a distributed computation 
with respect to a regular predicate. Our algorithm distributes the 
work and storage requirements across the system, thus reducing 
the space and computation complexity per process. 

I. Introduction 

Global predicate detection [1] for runtime verification is 
an important technique for detecting violations of invariants 
for debugging and fault-tolerance in distributed systems. It is 
a challenging task on a large system with a large number of 
processes due to the combinatorial explosion of the state space. 
The predicate detection problem is not only applicable to con- 
ventional distributed systems, but also to multicore computing. 
With growing popularity of large number of CPU-cores on 
processing chips [2|, some manufacturers are exploring the 
distributed computing model on chip with no shared memory 
between the cores, and information exchange between the 
cores only using message passing [3|. Recent research efforts 
im have shown that with sufficiently fast on-chip networking 
support, such a message passing based model can provide 
significantly fast performance for some specific computational 
tasks. With emergence of these trends, techniques in predicate 
detection for distributed systems can also be useful for systems 
with large number of cores. 

Multiple algorithms have been proposed in literature for 
detection of global predicates in both offline and online man- 
ner (e.g. lH], Is), (|6l). Online predicate detection is important 
for many system models: continuous services (such as web- 
servers), collection of continuous observations (such as sensor- 
networks), and parallel search operations on large clusters. 
However, performing predicate detection in a manner that is 
oblivious to the structure of the predicate can lead to large 
runtime, and high memory overhead. The approach of using 
mathematical abstractions for designing and analyzing com- 



putational tasks has proved to be significantly advantageous 
in modern computing. In the context of predicate detection, 
and runtime verification, the problem of abstraction can be 
viewed as the problem of taking a distributed computation 
as input and outputting a smaller distributed computation that 
abstracts out parts that are not relevant to the predicate under 
consideration. The abstract computation may be exponentially 
smaller than the original computation resulting in significant 
savings in predicate detection time. 

Computation slicing is an abstraction technique for effi- 
ciently finding all global states, of a distributed computation, 
that satisfy a given global predicate, without explicitly enu- 
merating all such global states [5|. The slice of a computation 
with respect to a predicate is a sub-computation that satisfies 
the following properties: (a) it contains all global states of the 
computation for which the predicate evaluates to true, and (b) 
of all the sub-computations that satisfy condition (a), it has 
the least number of global states. As an illustration, consider 
the computation shown in Fig. [ jja)! The computation consists 
of three processes Pi, P2, and P3 hosting integer variables 
xi, X2, and X3, respectively. An event, represented by a circle 
is labeled with the value of the variable immediately after the 
event is executed. 

Suppose we want to determine whether the property (or the 
predicate) (a;i * X2 + x^ < 5) A (xi > 1) A (2:3 < 3) ever 
holds in the computation. In other words, does there exist a 
global state of the computation that satisfies the predicate? The 
predicate could represent the violation of an invariant. Without 
computation slicing, we are forced to examine all global states 
of the computation, twenty-eight in total, to ascertain whether 
some global state satisfies the predicate. Alternatively, we can 
compute a slice of the computation automatically with respect 
to the predicate {xi > 1) A (a;3 < 3) as shown in Fig. [l Ub)| 

We can now restrict our search to the global states of the 
slice, which are only six in number, namely: 
{a, e, f, u, v}, {a, e, f, u, v, b}, {a, e, f, u, v, w}, 
{a, e, /, u, V, b, w}, {a, e, f, u, v, w, g}, and 

{a,e,f,u,v,b,w,g}. 

The slice has much fewer global states than the computa- 
tion itself — exponentially smaller in many cases — ^resulting in 
substantial savings. 

In this paper, we focus on abstracting distributed computa- 
tions with respect to re^M/ar predicates (defined in Sec.|II]i. The 
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Fig. 1: A Computation, and its slice with respect to predicate 

(xi > 1) A {X2 < 3) 



family of regular predicates contains many useful predicates 
that are often used for runtime verification in distributed 
systems. Some such predicates are: 

Conjunctive Predicates: Global predicates which are con- 
junctions of local predicates. For example, predicates of the 
form, B ^ {h > xi > ui) A (^2 > X2 > -U2) A . . . A (/„ > 
Xn > w„), where Xi is the local variable on process Pi, 
and li, Ui are constants, are conjunctive predicates. Some 
useful verification predicates that are in conjunctive form 
are: detecting mutual exclusion violation in pairwise manner, 
pairwise data-race detection, detecting if each process has 
executed some instruction, etc. 

Monotonic Channel Predicates Q: Some examples are: all 
messages have been delivered (or all channels are empty), at 
least k messages have been sent/received, there are at most k 
messages in transit between two processes, the leader has sent 
all "prepare to commit" messages, etc. 

Centralized offline |8| and online |9| algorithms for slicing 
based predicate detection have been presented previously. 
In this paper, we present the first distributed online slicing 
algorithm for regular predicates in distributed systems. 

A. Challenges and Contributions 

Computing the slice of a computation with respect to a 
general predicate is a NP-Hard problem in general [8|. Many 
classes of predicates have been identified for which the slice 
can be computed efficiently in polynomial time {e.g., regular 
predicates, co-regular predicates, linear predicates, relational 
predicates, stable predicates) ||8], ||5|, ||9l, ifTol . However, the 
existing slicing algorithms are centralized in nature. The slice 
is computed by a single slicer process that examines every 
relevant event in the computation. The centralized algorithms 
may be offline, where all events are known a priori, or online, 
where the slice is updated incrementally with the arrival of 
every new relevant event. For systems with large number of 
processes, such centralized algorithms require a single process 
to perform high number of computations, and to store very 
large data. In comparison, a distributed online algorithm sig- 
nificantly reduces the per process costs for both computation 
and storage. Distributed algorithms are generally faster in 
comparison to centralized algorithms and allow division of 
tasks among multiple processes. Additionally, for predicate 



detection, the centralized online algorithm requires at least one 
message to the slicer process for every relevant event in the 
computation, resulting in a bottleneck at the slicer process. 

A method of devising a distributed algorithm from a cen- 
tralized algorithm is to decompose the centralized execution 
steps into multiple steps to be executed by each process 
independently. However, for performing online abstraction 
using computation slicing, such an incremental modification 
would lead to a large number of messages, computational 
steps, and memory overhead. The reason for this inefficiency 
is that by directly decomposing the steps of the centralized 
onhne algorithm, the slicing computation would require each 
process to send its local state information to all the other 
processes whenever the local state (or state interval) is updated. 
In addition, only splitting the centralized algorithm across 
all processes leads to a distributed algorithm that wastes 
significant computational time as multiple processes may end 
up visiting (and enumerating) the same global state. Thus, the 
task of devising an efficient distributed algorithm for slicing is 
non-trivial. In this paper, we propose a distributed algorithm 
that exploits not only the nature of the predicates, but also the 
collective knowledge across processes. The optimized version 
of our algorithm reduces the required storage per slicing 
process, and computational workload per slicing process by 
0{n). An experimental evaluation of our proposed approach 
with the centralized approach can be found in the extended 
version of this paper 1.11 J . 



B. Applications 

Our algorithm is useful for global predicate detection. 
Suppose the predicate B is of the form Bi A B2, where Bi 
is regular but B2 is not. We can use our algorithm to slice 
with respect to Bi to reduce the time and space complexity 
of the global predicate detection. Instead of searching for the 
global state that satisfies B in the original computation, with 
the distributed algorithm we can search the global states in 
the slice for Bi. For example, the Cooper-Marzullo algorithm 
traverses the lattice of global states in an online manner 
inn, which can be quite expensive. By running our algorithm 
together with Cooper-Marzullo algorithm, the space and time 
complexity of predicate detection is reduced significantly 
(possibly exponentially) for predicates in the above mentioned 
form. 

Our algorithm is also useful for recovery of distributed 
programs based on checkpointing. For fault-tolerance, we may 
want to restore a distributed computation to a checkpoint 
which satisfies the required properties such as "all channels are 
empty", and "all processes are in some states that have been 
saved on storage". If we compute the slice of the computation 
in an online fashion, then on a fault, processes can restore 
the global state that corresponds to the maximum of the last 
vector of the slice at each surviving process. This global state 
is consistent as well as recoverable from the storage. 



II. Background: Regular Predicates and Slicing 
A. Model 

We assume a loosely coupled asynchronous message pass- 
ing system, consisting of n reliable processes (that do not fail), 
denoted by {Pi,P2, . . . ,Pn}, without any shared memory or 
global clock. Channels are assumed to be FIFO, and loss- 
less. In our model, each local state change is considered an 
event; and every message activity (send or receive) is also 
represented by a new event. We assume that the computation 
being analyzed does not deadlock. 

A distributed computation is modeled as a partial order 
on a set of events [12|, given by Lamport's happened-before 
(— ;>) relation lfT2l . We use (E,^) to denote the distributed 
computation on a set of events E. Mattern fT3l and Fidge 
|[T4| proposed vector clocks, an approach for time-stamping 
events in a computation such that the happened-before relation 
can be tracked. If V denotes the vector clock for an event 
e in a distributed computation, then for any event / in the 
computation: e -^ f <^ e.V < f.V. For any pair of events 
e and / such that e-/^fAf-/^e, e and / are said to 
be concurrent, and this relation is denoted by e\\f. Fig. | ^a)| 
shows a sample distributed computation, and its corresponding 
vector clock representation is presented in Fig. I 31b)| 

We now present some required concepts: 

Definition 1 (Consistent Cut). Given a distributed computa- 
tion {E, — S>), a subset of events C '^ E is said to form a con- 
sistent cut if C contains an event e only if it contains all events 
that happened-before e. Formally, e G Cf\f — >■ e =^ f G C. 

The concept of a consistent cut (or, a consistent global state) 
is identical to that of a down-set (or order-ideal) used in lattice 
theory [Tsl. Intuitively, a consistent cut captures the notion of 
a global state of the system at some point during its execution 

m. 

Consider the computation shown in Fig 



[a)l The subset 
of events {a, b, e} forms a consistent cut, whereas the subset 
{a, e, /} does not; because b ^ f (b happened-before /) but 
b is not included in the subset. A consistent cut can also be 
represented with a vector clock notation. For any consistent 
cut C, its vector clock C.V can be computed as C.V — 
component-wise-max{e.F | event e G C}, where e.V denotes 
the vector clock of event e. For this paper, we use a shortened 
notation for a cut of the computation. A cut C is denoted by 
the latest events on each process. Thus, {a, b, e} is denoted by 
[6, e] and {a, e, /} is represented as [a, /]. 

Table U shows all the consistent global states/cuts and their 
corresponding vector clock values for the computation in 
Fig. |2] We now present additional notions from lattice theory 
that are key to our approach. 

Definition 2 (Join). A join of a pair of global states is defined 
as the set-union of the set of events in the states. 

Definition 3 (Meet). A meet of a pair of global states is 
defined as the set-intersection of the set of events in the states. 



For two global states Ci and C2, their join is denoted with 
Ci U C2, whereas Ci n C2 denotes their meet. 

Tlieorem 1. i [75l/ . IU3]l Let C{E) denote the set of all 

consistent cuts of a computation {E, — ^). C{E) forms a lattice 
under the relation C. 

A global predicate (or simply a predicate) is a boolean- 
valued function on variables of processes. Given a consistent 
cut, a predicate is evaluated on the state resulting after 
executing all events in the cut. A global predicate is local if it 
depends only on variables of a single process. If a predicate 
B evaluates to true for a consistent cut C, we say that "C 
satisfies B" and denote it by Cb- 

Definition 4 (Linearity Property of Predicates). A predicate B 

is said to have the linearity property, if for any consistent cut 

C, which does not satisfy predicate B, there exists a process 

Pi such that a cut that satisfies B can never be reached from 

C without advancing along Pi. 

Predicates that have the linearity property are called linear 

predicates. 

For example, consider the cut [&, e] of the computation 
shown in Fig. | ^a)| The cut does not satisfy the predicate "all 
channels are empty", and for the given cut, progress must be 
made on P2 to reach the cut [6, /] which satisfies the predicate. 

The process Pi in the above definition is called a. forbidden 
process. For a computation involving n processes, given a 
consistent cut that does not satisfy the predicate. Pi can be 
found in 0{n) time for most linear predicates used in practice. 
To find a forbidden process given a consistent cut, a process 
first checks if the cuts needs to be advanced on itself; if not it 
checks the states in the total order defined using process ids, 
and picks the first process whose state makes the predicate 
false on the cut. The set of linear predicates has a subset, the 
set of regular predicates, that exhibits a stronger property. 

Definition 5 (Regular Predicates). A predicate is called regular 
if for any two consistent cuts C and D that satisfy the 
predicate, the consistent cuts given by (C n D) (the meet 
of C and D) and (C U D) (the join of C and D) also satisfy 
the predicate. 

TABLE I: Consistent Global States of Fig. |2] and Predicate 
Evaluation for B="all channels empty" 
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State 


Cut Vec. Clock 


Pred. Eval. 


1 


[] 


0,0 


True 


2 


[a] 


1,0 


True 


3 


[b] 


2,0 


False 


4 


[c] 


3,0 


False 


5 


[e] 


0,1 


True 


6 


[a,e] 


1,1 


True 


7 


[b,e] 


2,1 


False 


8 


[b,f] 


2,2 


True 


9 


[b,g] 


2,3 


True 


10 


[c,e] 


3,1 


False 


11 


[c,fl 


3,2 


True 


12 


[c,g] 


3,3 


True 
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Fig. 2; A Computation, Vector Clock Representation, and Slice with respect to predicate B ="all channels are empty" 



Examples of regular predicates include local predicates (e.g., 
X < 4), conjunction of local predicates (e.g., {x < 4) A(y > 2) 
where x and y are variables on different processes) and mono- 
tonic channel predicates (e.g., there are at most k messages 
in transit from Pi to Pj) |8|. Table U indicates whether or not 
the predicate "all channels empty" is satisfied by each of the 
consistent global cuts of the computation in Fig. |2] To use 
computation slicing for detecting regular predicates, we first 
need to capture the notion of join-irreducible elements for the 
lattice of consistent cuts. 

Definition 6 (Join-Irreducible Element). Let L be a lattice. 
An element x ^ L is join-irreducible if 

1) X is not the smallest element of L 

2) \/a,b € L : {x = a U h) =^ (a; = a) V (a; = b). 

Intuitively, a join-irreducible element of a lattice is one that 
cannot be represented as the join of two distinct elements of 
the lattice, both different from itself. For the lattice of con- 
sistent cuts of a distributed computation, the join-irreducible 
elements correspond to consistent cuts that can not be reached 
by joins (set-union) of two or more consistent cuts. For the 
computation of Fig. |2l the join-UTeducible consistent cuts are: 
[a],[6],[c],[e],[6,/],[6,5]. 

B. Computation Slice 

A computation slice of a computation with respect to a 
predicate S is a concise representation of all the consistent 
cuts of the computation that satisfy the predicate B. When the 
predicate B is regular, the set of consistent cuts satisfying B, 
Lb forms a sublattice of L, that is the lattice of all consistent 
cuts of the computation (_E, -^). Lb can equivalently be 
represented using its join-irreducible elements [15 1. Intuitively, 
join-irreducible elements form the basis of the lattice. The 
lattice can be generated by taking joins of its basis elements. 
Let Jb be the set of all join-irreducible elements of Lb- 
Let JB{e) denote the least consistent cut that includes e and 
satisfies B. Then, it can be shown |17| that 

Jb = {JBie)\e e E} 

The Jb {event) values, in vector clock notation, for each event 
of the computation in Fig. |2] are: 

JB{a) = [1,0], Jb(6) = [2,2], Jb{c) - [3,2], JB(e) - [0,1], 
JbU) = [2,2], JB{g) = [2,3]. We can now define a 
computation slice formally. 



Definition 7. Let {E, — >) be a computation and B be a regular 
predicate. The slice of the computation with respect to B is 
defined as (Jb,^)- 

The definition given here is different from the one given in 
iflTl and HI but equivalent for regular predicates as shown in 



Note: It is important to observe that Jb (e) does not necessarily 
exist. Also, multiple events may have the same JB^e). 

For the computation shown in Fig. I 3ta)[ Fig. l 3Jc)| presents 
a visual representation of the slice. 

A centralized online algorithm to compute Jb was proposed 
in 13. In the online version of this centralized algorithm, a pre- 
identified process, called slicer process, plays the role of the 
slice computing process. All the processes in the system send 
their event and local state values whenever their local states 
change. The slicer process maintains a queue of events for 
each process in the system, and on receiving the data from a 
process adds the event to the relevant queue. In addition, the 
slicer process also keeps a map of events and corresponding 
local states for each process in the system. For each received 
event, the slicer appends the event and local state mapping to 
the respective map. For every event e it receives, the slicer 
computes JB(e) using the linearity property. 

The centralized approach suffers from the drawback of 
causing a heavy load of messages as well as computation on 
just one process, namely the slicer process. Thus, for any large 
distributed computation, this approach would not scale well. 
To address this issue, we propose a distributed algorithm that 
significantly reduces the computational, as well as the message 
load on any process. 

III. A Distributed Online algorithm for Slicing 

In this section, we present the key ideas and routines for 
distributed online algorithm for computing the slice. The 
required optimizations that tackle the challenges hsted in 
Section II-AI are discussed later In our algorithm, we have 
n slicer processes, 81,82, .-., Sn, one for every application 
process. All slicer processes cooperate to compute the task of 
slicing {E, — >). Let E be partitioned into n sets Ei such that 
Ei is the set of events that occurred in P,;. In our algorithm, 
8i computes 

J^iB) ^ {JB{e)\e e E,} 

Observe that by the definition of join-irreducible consistent 
cut, e — > / implies JB{e) C JbU)- Since all events in a 



process are totally ordered, the set of consistent cuts generated 
by any Si are also totally ordered. Note: In this paper, the 
symbol — > indicates a happened-before relation; whereas the 
symbol <— in the pseudo-code denotes assignment operation. 
Algorithm [T] presents the distributed algorithm for online 
slicing with respect to a regular predicate B. Each slicer pro- 
cess has a token assigned to it that goes around in the system. 
Other slicer processes cooperate in maintaining and processing 
the token. The goal of the token for the slicer process Si is 
to compute J_B(e) for all events e ^ Ei. Whenever the token 
has computed Jsie) it returns to its original process, reports 
Jb{s) and starts computing JB{succ{e)), succ{e) being the 
immediate successor of event e. The token Ti carries with it 
the following data: 

• pid: Process id of the slicer process to which it belongs. 

• event: Details of event e, specifically the event id and 
event's vector clock, at Pi for which this token is com- 
puting Jb{g)- The identifier for event e is the tuple 
<pid, eid> that identifies each event in the computation 
uniquely. 

• gcut: The vector clock corresponding to the cut which is 
under consideration (a candidate for Jsie)). 

• depend: Dependency vector for events in gcut. The 
dependency vector is updated each time the information 
of an event is added to the token (steps explained later), 
and is used to decide whether or not some cut being 
considered is consistent. On any token, its vector gcut is 
a consistent global state iff for all i, depend[i] < gcut[i]. 

• gstate: Vector representation of global state correspond- 
ing to vector gcut. It is sufficient to keep only the states 
relevant to the predicate B. 

• eval: Evaluation of B on gstate. The evaluation is 
either true or false; in our notation we use the values: 
{predtrue,predfalse}. 

• target: A pointer to the unique event in the computation 
for which a token has to wait. The event need not belong 
to the local process. 

A token waits at a slicer process P, under three specific 
conditions: 

(CI) The token is for process Si and it has computed 
JB{pred{e)), pred{e) being the immediate predecessor 
event of e, and is waiting for the arrival of e. 

(C2) The token is for process Si and it is computing Jb(/), 
where / is an event on Pi prior to e. The computation of 
Jsif) requires the token to advance along process P^. 

(C3) The token is for process Sj such that j ^ i, and it is 
computing Jb{I) which requires the token to advance 
along process Pi. 

On occurrence of each relevant event e <E Ei, the computation 
process Pi performs a local enqueue to slicer Si, with 
the details of this event. Note that Pi and its slicer Si are 
modeled as two threads on the same process, and therefore 
the local enqueue is simply an insertion into the queue (that 
is shared between the threads on the same process) of the 
slicer. The message contains the details of event e, i.e. the 



Algorithm 1: Algorithm at Si 



ReceiveEvent (Event e, State localstatee) 

save <e.eid,localstate^> in local state map <procstates> 
foreach waiting token t at Si do 

if {t.target = e) then //t waiting for event e 
AddEventToToken (i,e) 
ProcessToken (i) 
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end 

AddEventToToken (Token t.Event e) 
t.gstate[e.pid] <— procState[e.eid] 
t.gcut[e.pid] <— e.eid 

if (t.pid = i) then //my token: update token's event pointer 
I t. event = e 

end 
t. depend <— max{t. depend, e.V) II set causal dependency 

ProcessToken (Token t) 

if (t.gcut is inconsistent) then 

/* find k : t.gcut[k] < t.depend[k] */ 

t.target <— t.gcut[k] + 111 set desired event 
send t to Sk 
else // t.gcut is consistent 
I EvaluateToken (t) 

end 

EvaluateToken(Token i) 

if B(t. gstate) then IIB is trae on cut given by t.gcut 

t.eval <— predtrue 

send t to process St. pid 
else // B is false on t. gstate 

t.eval <— predfalse 
/* Pk : forbidden process in t. gstate for B */ 

t.target <— t.gcut[k] + 1 

send t to S^ 
end 

ReceiveToken (Token t) 

if {t.eval = predtrue) A (t.pid = i) then //my token, B true 

output{t.pid, t.eid, t.gcut) 
I* token waits for the next event */ 
t.target •(— t.gcut[i] + 1 
t.waiting <— true 
else //either incosistent cut, or predicate false 
newid •<— t.target II id of event t requires 
if (3f S localEvents : f.id = newid) then 
//required event has happened 
AddEventToToken (t,/) 
EvaluateToken (t) 
end 

//else, the token remains in waiting state 
end 

Receive St op Signal 
foreach token t : t.pid ^ i do 

//not my token, send back to parent 

send t to St. pid 
end 



event identifier <eid,pid> , the corresponding vector clock 
e.V, and P^'s local state localstatce corresponding to e. The 
steps of the presented routines are explained below: 

ReceiveEvent (Lines [T][8]): On receiving the details 
of event e from Pi, Si adds them in the mapping of P^'s local 
states procstates (line|2]l. It then iterates over all the waiting 
tokens, and checks their target. For each token that has e as 
the target (required event to make progress). Si updates the 
state of the token, and then processes it. 



AddEventToToken (Lines l9lfT5]i: To update the state 
of some token t on Si, we advance the candidate cut to 
include the new event by setting t.gcut[i] to the id of event 
e. If Si is the parent process of the token (Ti), then the 
t.event pointer is updated to indicate the event id for which 
token is computing the join-irreducible cut that satisfies the 
predicate. The causal-dependency is updated at line [15] which 
is required for checking whether or not the cut is consistent. 



ProcessToken (Lines n"6ll22b : To process any token. 
Si first checks that the global state in the token is consistent 
(linefTTli and at least beyond the global states that were earlier 
evaluated to be false. For t's evaluation of a global cut t.gcut 
to be consistent, t.gcut must be at least t. depend. This is 
verified by checking the component-wise values in both these 
vectors. If some index k is found where t. depend > t.gcut, 
the token's cut is inconsistent, and t.gcut must be advanced 
by at least one event on Pk, by sending the token to slice r 
of Pk. If the cut is consistent, the predicate is evaluated 
on the variables stored as part of t.gstate by calling the 
EvaluateToken routine. 



EvaluateToken (Lines 12311311 1: The cut represented 
by t.gstate is evaluated; if the predicate is true, then the 
token has computed Jsie) for the event e —<t.pid,t.eid>. 
The token is then sent to its parent slicer. If the evaluation 
of the predicate on the cut is false, the target pointer is 
updated, at line |29l and the token is sent to the forbidden 
process on which the token must make progress. 



ReceiveToken (Lines [32p3] l: On receiving a token, 
the slicer checks if the predicate evaluation on the token is 
true, and the token is owned by the slicer. In such a case, the 
slicer outputs the cut information, and now uses the token 
to find JB{succ{e)), where succ{e) denotes the event that 
locally succeeds e. This is done by setting the new event 
id in t.target at line [35l and then setting the waiting flag 
(line[36]l. If the predicate evaluation on the token is false, then 
the target pointer of the token points to the event required 
by the token to make progress. Si looks for such an event 
(line [39b. and if it has been reported to Si by Pi, then adds 
that event (line \4l\i to the token and processes it (line [42]|. In 
case the desired event has not been reported yet to the slicer 
process, the token is retained at the process Si and is kept in 
the waiting state until the required event arrives. Upon arrival 
of the required event, its details are added to the token and 
the token is processed. 

Note: The notation of target ■(— t.gcut[i] + 1 means that if 
the t.gcut[i] holds the event id <pid, eid>, then the target 
pointer is set to <pid, eid + 1>. 



ReceiveStopSignal (Lines [46l[50l i: For finite 
computations, a single token based termination detection 
algorithm is used in tandem. Any one of the slicer process, 
let us assume Si holds a separate stop token. Whenever Pi 



is finished with its computation, it sends a signal to 5*1, and 
Si in turn checks if it has any tokens on which it has not 
updated the local events from Pi. Only after all such updates 
and processing is completed, and there are no more local 
events to process, 5*1 forwards the stop token to 5*2, and so 
on. When 5*1 receives the stop token from Sn, it can deduce 
that all the slicer processes have completed processing all the 
events from their local queues, and there is no slicing token 
that can be advanced further 5*1 then sends the 'stop' signal 
to all the slicer processes, including itself. On receiving 
the 'stop' signal. Si sends all the slicing tokens that do not 
belong to it back to their parent processes. 

Note that the routines require atomic updates and reads on 
the local queues, as well as on tokens present at Si. In the 
interest of space we skip presenting the lower level imple- 
mentation details, that involve common local synchronization 
techniques. 

A. Example of Algorithm Execution 

This example illustrates the algorithm execution steps for 
one possible run (real time observations) of the computation 
shown in Fig. [2l with respect to the predicate B = "all 
channels empty". The algorithm starts with two slicing 
processes Si and 52, each having a token - Ti and T2 
respectively. The target pointer for each token Ti is initialized 
to the event <«, 1>. When event a is reported. Si adds its 
details to Ti, and on its evaluation finds the predicate "all 
channels empty" to be true, and outputs this information. It 
then updates Ti. tar get pointer and waits for the next event 
to arrive. Similar steps are performed by S2 on T2 when e is 
reported. 

When b is reported to ^i, and Ti is evaluated with the 
updated information, the predicate is false on the state [b]. 
Given that 5 is a message send event, it is obvious that for 
the channel to be empty, the message receive event should 
also be incorporated. Thus, Si sends Ti to iS'2 after setting 
the target pointer to the first event on ^2 . On receiving Ti , S2 
fetches the information of its first event (e) and updates Ti. 
The subsequent evaluation still leads to the predicate being 
false. Thus S2 retains Ti and waits for the next event. 

When / is reported, S2 updates both Ti and T2 with /'s 
details. 52's evaluation on Ti.gstate, represented by [6,/] 
is true, and as per line [26] Ti is sent back to Si where 
the consistent cut [6, /] is output. Ti now waits for the 
next event. However, after being updated with the details 
of event /, the resulting cut on T2 is inconsistent, as the 
message-receive information is present but the information 
regarding the corresponding send event is missing. By using 
the vector clock values, r2's target would be set to the id 
of message-send event b. S2 would then send T2 to 5*1. 
On receiving T2, Si finds the required event (looking at 
T2.target) and after updating T2 with its details, evaluates 
the token. The predicate is true on T2.gstate now, and T2 is 
sent back to S2. On receiving T2, ^2 outputs the consistent 
cut [b, /], and waits for the next event. On receiving details 
of event c, and adding them to the waiting token Ti, the 



predicate is found to be true again on Ti, and 5*1 outputs 
[c, /]. Similarly on receiving g, 5*2 performs similar steps and 
outputs [b,g]. Note that the consistent cuts [a,b] and [c,g], 
both of which satisfy the predicate are not enumerated as 
they are not join-irreducible, and can be constructed by the 
unions of [a], [b] and [c, /], [b,g] respectively. 

B. Proof of Correctness 

This section proves correctness and termination (for finite 
computations) of the distributed algorithm of Algorithm[T] The 
proofs presented here are for finite computations. The correct- 
ness argument can be easily extended to infinite computations. 

Lemma 1. The algorithm presented in Algorithm Q] does not 
deadlock. 

Proof: The algorithm involves n tokens, and none of the 
tokens wait for any other token to complete any task. With 
non-lossy channels, and no failing processes, the tokens are 
never lost. The progress of any token depends on the target 
event, and as per lines |4]|7] whenever an event is reported to 
a slicer, it always updates the tokens with their target being 
this event. Thus, the algorithm can not lead to deadlocks. ■ 

Lemma 2. If a token Ti is evaluating Jsie) for e G Ei, 
assuming Jsie) exists, and if Ti.gcut < Jsifi), then Ti.gcut 
would be advanced in finite time. 

Proof: If during the computation of Jb (e), at any instance 
Ti.gcut < Js(e), then there are two possibilities for gcut: 

(a) gcut is consistent: This means that the evaluation of 
predicate B on gcut must be false, as by definition Jsie) 
is the least consistent cut that satisfies B and includes e. In 
this case, by line |29] and subsequent steps, the token would be 
forced to advance on some process. 

(b) gcut is inconsistent: The token is advanced on some 
process by execution of lines [T7lfT9l ■ 

Lemma 3. While evaluating Jb{£) for event e E Ei on token 
Ti, if Ti.gcut < Jb{s) currently and Jb{P') exists then the 
algorithm eventually outputs Jb{s)- 

Proof: By Lemma |2] the global cut of Ti would be 
advanced in finite time. Given that Jaie) exists, we know that 
by the linearity property, there must exist a process on which 
Ti should progress its gcut and gstate vectors in order to 
reach the Jb{g)\ lines [2911311 ensure that this forbidden process 
is found and Ti sent to this process. By the previous Lemma, 
the cut on the Ti would be advanced until it matches JB{e)- 
By line [34] of the algorithm, whenever JB(e) is reached, it 
would be output. ■ 

Lemma 4. For any token T, the algorithm never advances 
Ti.gcut vector beyond Jb{&) on any process, when searching 
Jsie) for e G Ei. 

Proof: The search for JB{e) starts with either an empty 
global state vector, or from the global state that is at least 
JB{pred{e)), where pred{e) is the immediate predecessor 



event of e on Si. Thus, till JB{e) is reached, the global 
cut under consideration is always less than JB{e)- From the 
linearity property of advancing on the forbidden process, and 
Lemma [2 the cut would be advanced in finite time. Whenever 
the cut reaches JB{e), it would be output as per Lemma[3]and 
the token would be sent back to its parent slicer, to either begin 
the search for succ{e) or to wait for succ{e) to arrive (succ{e) 
being the immediate successor of e). Thus, Ti.gcut would 
never advance beyond Jb{g) on any process when searching 
for JB{e) for any event e. ■ 

Lemma 5. // token Ti is currently not at Si, then Ti would 
return to Si in finite time. 

Proof: Assume Ti is currently at Sj (j ^ i). Sj would 
advance Ti.gcut in finite time as per Lemma [2] With no dead- 
locks (Lemma[T]i, and by the results of Lemma[3]and Theorem 
[H we are guaranteed that if JB{Ti.event) exists then within a 
finite time, Ti .gcut vector would be advanced to Jb (Ti.event) 
and Ti would be sent back to Si. If JsiTi.event) does not 
exist then at least one slicer process Sk would run out of all 
its events while attempting to advance on Ti.gcut . In such 
a case, knowing that there are no more events to process, Sk 
would send Tj back to Si (lines [46l[50] l. ■ 

Theorem 2. (Termination).- For a finite computation, the 
algorithm terminates in finite time. 

Proof: We first prove that for any event e G Ei, com- 
putation of finding JB{e) with token T takes finite time. By 
Lemma [21 Ti always advances in finite time while computing 
JB{e). If Jb{g) exists, then based on this observation within a 
finite time the token Ti would advance its gcut to JB{e), if it 
exists. By Lemma[3] the algorithm would output this cut, thus 
finishing the Jb{g) search and as per Theorem [4l would not 
advance any further for Jb{s) computation. Thus, if Jb{£) 
exists then it would be output in finite time. By Lemma [51 the 
token would be returned to its parent process and the JB(e) 
computation for e E Ei would finish in finite time. 

If JB{e) does not exist, then as we argued in Lemma [51 
some slicer would run out of events to process in the finite 
computation, and thus return the token to Si, which would 
result in search for JB{e) computation to terminate. As each 
of these steps is also guaranteed to finish in finite time as 
per above Lemmas, we conclude that Jb{g) computation for 
e E Ei finished in finite time. 

Now we can apply this result to all the events in E, and 
guarantee termination in finite time. ■ 

Theorem 3. The algorithm outputs all the elements of Jb- 

Proof: Whenever any event e E E occurs, it is reported 
by some process Pi on which it occurs, to the corresponding 
slicer process Si. Thus e can be represented as e G Si . If at 
the time e is reported to Si, Ti is held by Si then by Lemmas[2l 
and [3 it is guaranteed that the algorithm would output JB{e). 
If Si does not hold the token Ti when e is reported to it, 
then by Lemma [51 Ti would arrive on Si within finite time. 
If Si has any other events in its processing queue before e. 



then as per Theorem |2] Si would finish those computations 
in finite time too. Thus, within a finite time, the computation 
for finding Jb{c^ with Tt would eventually be started by Si. 
Once this computation is started, the results of Lemmas |2] and 
[3] can be applied again to guarantee that the algorithm would 
output Jb{c), if it exists. 

Repeatedly applying this result to all the events in E, we 
are guaranteed that the algorithm would output Js(e) for 
every event e e £' . Thus the algorithm outputs all the join- 
irreducible elements of the computation, which by definition 
together form Jb- ■ 

Theorem 4. The algorithm only outputs join-irreducible 
global states that satisfy predicate B. 

Proof: By Lemma 21 while performing computations for 
e e i?i on token Ti, the algorithm would not advance on 
token Ti beyond JB{e)- Since only token Ti is responsible 
for computing Jb (e) for all the events e E Ei , the algorithm 
would not advance beyond Jb{s) on any token. In order 
to output a global state that is not join-irreducible we must 
advance the cut of at least one token beyond a least global state 
that satisfies B. The result follows from the above assertions. 



Algorithm 2: SendlfNeeded at Si 



SendlfNeeded (Token t, int k) 

/* k: id of the slicer process to which t should be sent */ 
while (k y^ i)A (have tokenj^} do 
/* t should be sent to Sk, and Si has 5fc's token */ 
if (t.target = tokens-event) then lltoken^ has info of t's 
//target event 

t.gcut[k] <— tokenk-gcut\k\ 
t.depend[k] •(— tokenk.depend[k] 
t.state[k] <— tokenk-state[k] 
if (t.gcut is inconsistent) then //still inconsistent 
/* find j : t.gcut[j] < t.depend[j] */ 
t.target <— t.gcut[j] + 1 
k ■(— j //set k for while condition 
else // t.gcut is consistent now, evaluate 
EvaluateToken (t) 



9 
10 
11 
12 

1 
14 
15 
16 



18 
19 



end 
else // desired event details not in tokens 
I break 

end 
end 
/* desired token or event info not present */ 
if (t.target.pid ^ i) then 

/* target event on some other process */ 
send t to S^ 
end 



Theorem |2] guarantees termination, and correctness follows 
from Theorems |3] and |4] 

IV. Optimizations 

The distributed algorithm presented in the previous section 
is not optimized to avoid redundant token messages, as well as 
duplicate computations. Whenever a slicer process Si needs to 
send any token to another process 5fc, it should first check if 
it currently holds the token Tfe, and if the desired information 
is present in T^. If the information is available, the token Ti 
can be updated with the information without being sent to 
Sk\ and only if the details of required event are not available 
locally, the token is sent to 5*^. These steps are captured in 
the procedure SendlfNeeded shown in Algorithm|2] 

There are additional optimizations that significantly reduce 
the number of token messages. It is easy to observe that in 
the proposed form of Algorithm [T] the algorithm performs 
many redundant computations. This redundancy is caused 
by computations of J_B(e) and Jb(/) where e ^ f, and 
Jb{s) — JbH)- In this case, given that both the join- 
irreducible consistent cuts are same, it would suffice that the 
algorithm only compute either of them. For this purpose, we 
first present some additional results: 

Lemma 6. / G JB(e) => JbU) C JB(e). 

Proof: JbH) is the least consistent cut of the computa- 
tion that satisfies the predicate, and contains /. Jb{g) includes 
/, and satisfies the predicate. Therefore JbH) ^ JB{e). ■ 

Lemma 7. / e JB(e) A e e Jb(/) => Jb^g) = JbU)- 

Proof: Apply previous Lemma twice. ■ 



Lemma 8. e^ f Af e Jsie) =» JBif) = Ji3(e). 

Proof: By Lemma |6l / e JB{e) implies that JbH) ^ 
Jb{s) must hold. Given e — ^ /, by the consistency require- 
ment JbH) must contain e. Thus, Jb{g) Q Jb{I)- ■ 

In order to prevent computations that result in identical join- 
irreducible states, we modify the proposed distributed algo- 
rithm of Algorithm [T] to incorporate Lemmas |7] and |8] The 
modified algorithm is presented in Algorithm |3] 

We do not reproduce the functions ReceiveEvent, 
EvaluateToken, ReceiveStopSignal and 

SendlfNeeded in the modified algorithm (in Algorithm |3]l, 
as they remain identical to their earlier versions. In the 
optimized algorithm an additional variable, currentE - at 
each slicer process Si, is used as a local pointer to keep track 
of the event e for which Si is currently computing JB{e). 
Tokens also stores this information (with token. event), 
however the token Ti is not always present on Si. By keeping 
currentE updated, even in absence of token Ti, the slicer 
process Si can delay the progress of other tokens whenever 
it suspects that these tokens may undergo the same Jb{£) 
computation that is being considered by Ti. For stopping 
possibly duplicate computations, a flag, called stalled, is 
maintained in each token. By setting the stalled flag on any 
token, a sheer removes the token from the set of waiting 
tokens; and no updates are performed on tokens that are in the 
stalled state. These modifications allow slicer processes to 
delay the computation progress on stalled tokens, and ensure 
that no two tokens finish and output any two join-irreducible 
consistent cuts that are equal. The optimized algorithm also 
makes use of the type information of events, for identifying 
if an event is a send of a message (denoted by type 



Algorithm 3: Optimized Routines at Si 

1 AddEventToToken (Token t,Event e) 

2 t.gstatele.pid] <— procState[e.eid] 

3 t.gcut[e.pid] <— e.eid 

4 if (t.pid = i) then //my token: update cuiTent event 

5 currentE = e.eid 

6 t. event = e 

7 end 

8 t. depend -^ Tnax{t. depend, e.V) 

9 ProcessToken (Token t) 

10 if (t.gcut is inconsistent) then 

11 if (t. event. type = MSGRECV) then // message receive 
event 

12 t.target <— t. event. sender 

13 if {t.pid = i) then // my token 

14 I t. stalled 4— true //stall, wait to hear from target 

15 end 

16 else // progress other processes' token 

/* find k s.t. t.gcut[k] < t.depend[k] */ 

17 t.target 4— t.depend[k] 

18 Sendlf Needed (4) // use optimized send approach 

19 end 

20 end 

21 end 

22 
23 
24 
25 
26 
27 
28 



Algorithm 4: Helper Routines at Si 



else // t.gcut is consistent, evaluate its state 
I EvaluateToken (t) 

end 
ReceiveToken (Token i) 

if (t.eval = predtrue) A (t.pid = i) then // my token, B trae 

output(t.pj(i, t.eid, t.gcut) 

Updates talledTokens (t.eid,t.gcut) 
I* token waits for the next event */ 

t.target •<— t.gcut[i\ + 1 

t.waiting 4— true 
end 
else //either incosistent cut, or predicate false 

newid <— t.target II id of t's required event 

if t. event -f^ currentE A newid > currentE A i > t.pid 

then // no causal-dependency, and symmetry breaking 
I t. stalled <r- true //stall the token 



end 

else if (3f g localEvents : f.id = 
II required event has happened 

AddEventToToken (t,/) 
EvaluateToken (t) 
end 



35 
36 

37 else if (3_f g localEvents : f.id = newid) then 

38 
39 
40 
41 

42 end 

43 ReceiveCut (Event efent, Vector cntV,Vector stateV) 

44 foreach .stalled token t at Si do // check if t can be updated 

45 I CopyCut If Needed (t, Client, cutV, stateV, i) 

46 end 



MSGSEND) or a receive of a message (denoted by type 
MSGRECV). The modifications are briefly explained below: 

Algorithm [3l Optimized Routines 

Lines 11111151 and I43H46I If the type of an event e e K 



UpdateStalledTokens (Token t') 
foreach stalled token t at Si do 

I CopyCutIf Needed (t,t' .event, t' .cutV,t' .stateV, t.pid) 

end 

if (t' .event.type = MSGSEND) then 

/* send cut details to message recipient process; k: id of the 
message recipient process */ 

Sfe. ReceiveCut (t' .event, t' .gcut, t' .g state) 
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end 

CopyCutIf Needed (Token t. Event event. Vector cutV , Vector 
stateV , int ignore) 

if t.target = event then // t was waiting for update from event 
foreach j in 1 to n s.t. j ^ ignore do //copy relevant details 
t.gcut[j] ^ cutV[j] 
t.depend[j] <— cutV[j] 
t.gstate[j] <— stateV [j] 
end 

/* clear stalled state; ensure no duplicate output */ 
t. stalled <— false 
t.eval •<— predfalse 
if (t.event e cutV) then 

/* cuts are same, move on to next event */ 
t.target •(— t.gcut[i] + 1 
if (t.pid 7^ i) then 
I send t to St.p^a 
end 
end 
else 

/* cuts not same, resume token's computation */ 
ProcessToken (t) 
end 
end 



indicates that the event is a message receipt, then the 

algorithm stalls the token Ti, and sets its 'target' as the 

message send event, /. This step is to incorporate Lemma |8] 

in speculative manner. Whenever the coiTesponding slicer Algorithm [H Helper Routines 

process of message send event (/) finishes computing the 

Jsif), it informs Si about the computed cut. Si on receiving 

this cut, calls the helper sub-routine CopyCutIf Needed 

(shown in Algorithmic that checks if e belongs to Jb{I) and 

thus Jb{c) computation is not needed; otherwise Si restarts 



the computation for Js(e). 

Lines I34ll36l These steps incorporate Lemma |7] in speculative 
manner The first condition t.event -f^ currentE ensures that 
if the cuiTent computation on token T, is causally dependent 
on the computation on token t, then t is not stalled. The 
second condition is evaluated only if t.event and currentE 
are not causally related, i.e. they are concurrent. If this is the 
case, then the check {newid > currentE Ai > t.pid) ensures 
that the computation of Jsit.event) does not progress 
beyond the current ongoing computation on Si, and performs 
symmetry breaking (so that there is no deadlock) in favor 
of the token/process with larger process id. This guarantees 
that whenever the cuts of two concuiTent events are same, 
only one of the tokens (with the smaller process id) finishes 
computing the cut, and thus duplicate computations are not 
performed. 

Line |28] Whenever Si finishes computing the Jsie) for 
the event e e Ei, it tries to update all the tokens that were 
speculatively stalled to avoid computing the same cut. The 
steps involved are explained next. 



Lines [T]|2l For each token that is stalled, either locally or 
at some other slicer process, due to the event t'. event, update 
(tokens present locally) or notify (at some other process) it. 
The notification to other slicer processes is performed by 
sending the details of the cut to them. If the stalled tokens 



infer, using the checks of lines |9] and [17] that their cuts (if 
computed) and t' .gcut would be same by the application of 
Lemmas [8] and |7] then they copy the cut details of t' and move 
on to the next events on their respective processes. 

A. Example of Optimized Algorithm Execution 

We revisit the example presented in section IIII-AI for the 
distributed algorithm run, in order to show the difference in 
execution for the optimized algorithm. When / is reported to 
^2, the earlier version of the algorithm has to update the token 
T2, and send it to 5*1 in order to make the cut on T2 consistent. 
The optimized algorithm uses the checks on lines [TTIfTS] in 
Algorithmic] and determines that / being a message-receive 
event, the Jsif) computation should not be started until the 
corresponding message-send event's computation is reported 
to 52. Thus T2 would be kept in stalled state, until Ti 
finishes the Jb(&) computation. When Jb(&) computation 
on Ti is finished, with Jb(6) — [b,f], then as per line |28] 
of Algorithm [3] 5*1 would send the information of &'s join- 
irreducible cut to 52. On receiving the cut details, 52 would 
try to update its stalled token T2, and as per lines [glfTSl it 
would infer that JbH) — J sib)- Thus, it would just copy 
the details of the cut as the result for Jsif), and move on to 
computing Js(g). 

B. Proof of Correctness 

To prove the correctness of the optimized version of the 
algorithm, it suffices to show that this version does not 
introduce deadlocks, and as desired - does not enumerate 
duplicate join-irreducible consistent cuts. 

Theorem 5. Algorithm \3\ cannot lead to deadlocks. 

Proof: Let us assume that the algorithm leads to a 
deadlock. Thus, even in presence of events required to process, 
a set of tokens is not able to make progress, with every token 
in this set being in the stalled state. There are two possible 
scenarios for this: 

(a): The tokens are stalled such that for each token Ti, the 
event Ti.event, for which it is computing the join-irreducible 
cut, is concurrent to every other token Tj's event Tj. event. 
As all the tokens have unique positive ids, there is a unique 
minimum id. By the required check performed at line |34] in 
Algorithm |3] such a token can never be stalled. Thus, we have 
a contradiction. 

(b): All the tokens are stalled such that for at least one pair 
of tokens T,; and Tj, Ti.event —^ Tj. event. In such a case, by 
the check performed at line[TT]in Algorithmic] T^'s token can 
never be stalled at Sj . Thus Ti must be stalled at some other 
process Sk such that Ti.event is concurrent with Tk.event. 
But either i < k or i > k, and in either case, by applying 
the result of case (a) above, we are guaranteed that Ti would 
eventually make progress. Thus Ti would not remain stalled 
forever Again, we have a contradiction. ■ 

Theorem 6. If Jb{g) = Jsif) for two distinct events e and 
f, then only one of these cuts is enumerated in Algorithm |5] 



Proof Given that e ^ /, then either: (a) e||/ (both are 
concurrent), or e — > / (assume without loss of generality). If 
e\\f, then as per line |34| one of their tokens would be stalled 
on the other process. Again, without loss of generahty assume 
/'s token was stalled on e's process. Thus, given that there 
can be no deadlocks, the Jsie) computation would eventually 
finish, and update /'s token to indicate that it should move to 
the successor of / (as per the steps in Algorithm |4}. Hence, 
Jsif) would not be enumerated. 

If e — !■ /, then /'s token would be stalled on its own process, 
until the completion of Jb{g) computation, upon which /'s 
token would be made to move to the event that is the successor 
of /. Again JbH) would not be enumerated. ■ 

C. Analysis 

Each token Ti processes every event e G Ei once for 
computing its Jb{s). If there are |_E| events in the system, then 
in the worst case Ti does 0(n|i?|) work, because it takes 0{n) 
to process one event. We are assuming here that evaluation of 
B takes 0{n) time given a global state. There are n tokens 
in the system, hence the total work performed is 0{n'^\E\). 
Since there are n slicing processes and n tokens, the average 
work performed is 0{n\E\) per process. In comparison, the 
centralized algorithm (either online or offline) requires the 
s/icer process to perform 0{n'^\E\) work. 

Let |5| be the maximum number of bits required to represent 
a local state of a process. The actual value of \S\ is subject to 
the predicate under consideration, as the resulting number/type 
of the variables to capture the necessary information for 
predicate detection depends on the predicate. The centralized 
onUne algorithm requires 0(|i5||5|) space in the worst case; 
however it is important to notice that all of this space is 
required on a single (central sheer) process. For a large compu- 
tation, this space requirement can be limiting. The distributed 
algorithm proposed above only consumes 0(|i?i||5|) space 
per slicer Thus, we have a reduction of 0{n) in per slicer 
space consumption. 

The token can move at most once per event. Hence, in 
the worst case the message complexity is 0(|i?|) per token. 
Therefore, the message complexity of the distributed algo- 
rithm presented here is 0(n|i?|) total for all tokens. The 
message complexity of the centralized online slicing algorithm 
is 0(|_B|) because all the event details are sent to one (central) 
slicing process. However, for conjunctive predicates, it can 
be observed that the message complexity of the optimized 
version of the distributed algorithm is also 0(|i?|). With 
speculative stalling of tokens, only unique join-irreducible cuts 
are computed. This means that for conjunctive predicates, a 
token only leaves (and returns to) Si, 0{\Ei\) times. As there 
are n tokens, the overall message complexity of the optimized 
version for conjunctive predicates is 0(|i?|). 

V. Evaluation & Discussion 

As indicated by the analysis of the distributed algorithm 
in Section IIV-CI the distributed algorithm reduces the worst 
case complexity for the total work per slicer process by an 



order of magnitude. To better understand the actual gains of 
the distributed algorithm, we implement both centralized and 
distributed algorithms, and evaluate their performance based 
on the experimental results for slicing on the same set of 
computations. 

The experiments were performed on a 64 bit, 8 processor 
(2.3 GHz) machine with 4GB memory, running Linux 3.2.0- 
32 kernel. Each process is allowed to run till it executes 
its local program counter to a fixed upper-bound. For the 
reported results the local program counter upper-bound was 
set to 100. Message activity was decided in a randomized 
manner After each local state change the processes could 
send a message, with a probability of 0.8, to a randomly 
chosen process. The number of processes in the computation 
was varied from 2 to 10. We monitor the total object sizes 
of the centralized sheer, and each of the distributed slicer 
processes while they find the slice for an ongoing computation. 
We also monitor the total number of messages received by 
all the slicer processes during the online computations. For 
any distributed slicer Si, the reported number of messages 
includes the token messages received by it from other slicers. 
We present a comparison of the centralized and distributed 
algorithms in terms of maximum values of both the consumed 
space and total messages per slicer process. 

Fig- 1 31^)1 plots the ratio of maximum space consumed by 
any distributed slicer object and the maximum space used 
by the centralized slicer (for the same computation), against 
the number of processes in the computation. The space con- 
sumption is evaluated at pre-determined check points that are 
symmetric for both centralized and distributed algorithms. Fig. 
I 31b)| presents the maximum number of messages received by 
centralized slicer and that received by any distributed slicer 
for the same instances of computations. 

VI. Related Work 

The distributed algorithm presented in this paper constructs 
the slice of a distributed computation with respect to a regular 
state based predicate. The constructed slice can then be used to 
determine if some consistent cut of the computation satisfies 
the predicate. This is referred to as the problem of detecting 
a predicate under possibly modality fl). In fT), a predicate is 
detected by exploring the complete lattice of consistent cuts 
in a breadth first manner Alagar et. al. |j6] use a depth first 
traversal of the computation lattice to reduce space complexity. 
The algorithms in [IJ and [61 can handle arbitrary predicates, 
but in general have exponential time complexity. In contrast, 
the slicing algorithm presented in this paper for a regular 
predicate has polynomial time complexity. 

In this paper we assume a static distributed system. Pred- 
icate detection algorithms have been proposed for dynamic 
systems (e.g. |18|, (111, |l20l. El], |E2|), where processes 
may leave or join. However, these algorithms detect restricted 
classes of predicates like stable predicates and conjunctive 
local predicates, which are less general than regular pred- 
icates. In computation slicing, we analyze a single trace 
(or execution) of a distributed program for any violation of 



the program's specification. Model checking (cf. |23|) is a 
formal verification technique that involves determining if (all 
traces of) a program meets its specification. Model checking 
algorithms conduct reachability analysis on the state space 
graph, and have a time complexity that is exponential in 
number of processes. 

Partial order methods (cf. E4\ ) aim to alleviate the state- 
explosion problem by minimizing the state space for predicate 
detection. This is done by exploring only a subset of the 
interleavings of concurrent events in a computation, instead of 
all possible interleavings. However, predicate detection algo- 
rithms based on partial order methods still have exponential 
time complexity, in the worst case. In this paper, the focus 
is on generating the slice with respect to a predicate. Partial 
order methods such as 1251 can be used in conjunction with 
slicing to explore the state space of a slice in a more efficient 
manner ll26l . 

The work presented in this paper is related to runtime 
verification (cf. Il27l ). which involves analyzing a run of a 
program to detect violations of a given correctness property. 
The input program is instrumented and the trace resulting 
from its execution is examined by a monitor that verifies its 
correctness. Some examples of runtime verification tools are 
Temporal Rover [28J, Java-MaC |29J, JPaX [30J, JMPaX 1311 
and j Predictor ll32l . The Temporal Rover, Java-Mac and JPaX 
tools model the execution trace as a total order of of events, 
which is then examined for violations. In the JMPaX and 
jPredictor tools, as in our algorithm, the trace is modeled 
as a partial order of events. A lattice of consistent cuts of 
the computation is then generated, which is searched by the 
monitor. Further, these tools generate states not observed in the 
current trace, to predict errors that may occur in other runs , 
thereby increasing the size of the computation lattice. Chen et 
al. Il33l note that computation sUcing can be used to make tools 
like jPredictor more efficient by removing redundant states 
from the lattice. All of these tools are centralized in nature, 
where the events are collected at a central monitoring process. 
Sen et al. f34l present a decentralized algorithm that monitors 
a program's execution, but can only detect a subset of safety 
properties. The distributed algorithm presented by Bauer et 
al. [i35i can handle a wider class of predicates, but requires 
the underlying system to be synchronous. 

VII. Conclusion 

In this paper, we presented a distributed online algorithm 
for performing computation slicing, a technique to abstract 
the computation with respect to a regular predicate. The re- 
sulting abstraction (slice) is usually much smaller, sometimes 
exponentially, in size. For regular predicates, by detecting the 
predicate only on the abstracted computation, one is guaran- 
teed to detect the predicate in the full computation, which 
leads to an efficient detection mechanism. By distributing the 
task of abstraction among all the processes, our distributed 
algorithm reduces the space required, as well as computational 
load on a single process by a factor of 0{n). We also presented 
an optimized version of the distributed algorithm that does 
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Fig. 3: Comparison of Memory Usage and Total Messages for Conjunctive Predicate 



not perform redundant computations, and requires reduced 
number of messages. The results of experimental evaluation 
(available in extended version of this paper at lITTl ') compare 
the performance of our distributed algorithm with that of the 
existing centralized algorithm. 
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